Check CUDA out of memory #213

thodkatz · 2024-08-02T14:24:01Z

In conjunction with ilastik/ilastik#2891, eventually our goal would be to find the maximum possible tensor shape that can fit to the GPU memory for a simple forward pass.

To accomplish this, tiktorch should provide clients the functionality of doing the memory prompting to the GPU.

Two procedures have been added:

IsCudaOutOfMemory(shape) -> bool, checks for a given shape if it fits to the GPU memory
MaxCudaMemoryShape(minShape, maxShape, step) -> shape, for a range of [minShape, maxShape] and valid increments of step, returns the maximum shape that fits to the GPU memory.

Exposing the first one, could be still valuable if we have a good guess of a shape, since the second one can be very time consuming, depending on the range.

The implementation of prompting the GPU has been inspired as well by plant-seg.

Note for the review:
The inference_pb2_grpc.py and inference_pb2.py are auto-generated files by make protos.

k-dominik

Awesome! Great that you got into tiktorch so quickly!

I left some minor comments and a question about how we want to see the interface for this functionality (basically ints vs namedints)...

environment.yml

proto/inference.proto

tiktorch/server/grpc/inference_servicer.py

thodkatz · 2024-08-06T12:52:54Z

hey @k-dominik :)

I have attempted to refactor the design of the ModelInfo and the concept of shapes.

The previous design of the ModelInfo was coupled to the representation of the ModelSession message used for the grpc communication. The new design creates a more feature-rich interface to work easily with the concept of shape, and the compression of the ModelInfo needed to be transferred by the server, is localized (info2session).

With this interface clients also can tranform the data from the grpc channel to a feature-rich object again with session2info().

Do you think that the design is on the right track?

thodkatz · 2024-08-06T12:59:17Z

tiktorch/server/session/process.py

+ self._min_shape = min_shape
+ self._steps = steps
+ assert self._min_shape.is_same_axes(self._steps)
+ assert all(step == 0 for axis, step in steps if axis not in AxisWithValue.SPATIAL_AXES) # todo: ?


I am not sure if this is valid? Does it make sense to allow non zero step values for non spatial axes increments?

k-dominik

For the refactor of ModelInfo; In general I like the idea of having something more usable directly :), good idea!
With the implementation details for the "rich" nicer to use objects I left comments that mostly leaned towards subclassing dict, which are probably outdated as I'm thinking.... actually, what do you think about investigating if bioimageio.spec classes can be reused for this purpose. I think, at least for me, any removed layer of transformation would reduce cognitive load. On the other hand this would bind us to bioimageio.spec, at least for the metadata. But since that's the format we're supporting (and the only one) I don't see much of a problem.
Of course then I immediately ask myself why even do the dance and not parse the model on the client also, and be done with it (and we already do parse the spec on the client(ilastik) side.)...

Also the ModelInfo (and related convenience classes) would need a very heavy refactor when updating to 0.5 of the spec.

idk, what do you think @thodkatz?

tiktorch/server/session/process.py

Since we removed ModelInfo interface, the proto buff for InputShape and OutputShape, and their conversions are redundant

Two procedures have been added: - Get the maximum tensor shape - Check if a tensor's shape fits to memory

The current interface supports multiple device ids. To check if a cuda memory request is a valid one, meaning that a gpu is detected, a device id is needed to do the check for the available ones if any.

thodkatz · 2024-08-19T12:59:47Z

Since the PR of removing the ModelInfo was merged, I have rebased this PR.

I have removed some leftover code from the previous ModelInfo interface.
I have addressed the comments regarding detection of gpu, with adding a deviceId field to the cuda memory requests and using the function check_gpu_exists.
Shapes are consistent across proto buff requests. A NamedInt is expected.
Additional tests for invalid request of the MaxCudaMemoryShape.
Since we are going to rely to bioimageio spec classes, I have removed any fancy objects, mentioned in the refactoring stage.

thodkatz · 2024-08-20T09:16:34Z

tiktorch/converters.py

+def get_axes_with_size(axes: Tuple[str, ...], shape: Tuple[int, ...]) -> NamedShape:
+ if len(axes) != len(shape):
+ raise ValueError(f"{axes} and {shape} incompatible length. It should be equal")
+ InputTensorValidator.is_shape(shape)


oops, this check should actually do something

thodkatz · 2024-08-20T09:17:33Z

tiktorch/converters.py

+
+ def _check_shape_explicit(self, spec: nodes.InputTensor, tensor_shape: NamedShape):
+ assert self.is_shape_explicit(spec)
+ reference_shape = {name: size for name, size in zip(spec.axes, spec.shape)}


we could use the get_axes_with_size

thodkatz · 2024-08-20T09:23:51Z

tiktorch/converters.py

+
+ def _check_shape_parameterized(self, spec: nodes.InputTensor, tensor_shape: NamedShape):
+ assert isinstance(spec.shape, ParametrizedInputShape)
+ if not self.is_shape(tensor_shape.values()):


if this is part of the get_axes_with_size that return NamedShape, where we assume a valid named shape, meaning a map of the names of the axes, along with their size (natural number), then we can remove this check from here, and integrate it into the get_axes_with_size. Maybe the Dict[str, int] can be an actual class, not a type, to enforce this. I should maybe have kept the AxesWithValue

thodkatz · 2024-08-20T10:51:58Z

Now that I think about it, we can improve the design:

What we actually need is just a validated xr.DataArray aka tensors, and not really plain shapes. The xr.DataArray can be convenient since it contains already the logic of dims, that should represent the model axes. So we can utilize the Sample to create only validated Samples. That way we keep the validation in one place. Currently, there is no way to infer if a sample is actually valid or not, and we don't actually need valid shapes, but valid tensors.

The xr.DataArray will handle as well all the logic for potentially negative values in the shape, and some checks can be skipped.

The reasoning of needing valid tensors, can be understood by checking the implementations of the proceduers Predict, IsCudaOutofMemory, MaxCudaMemoryShape, that eventually everything is realized as a tensor, to be forwared to the model.

k-dominik

Having this functionality is really great! I can already see us using gpus more efficiently!

In general it would be great to add docstrings to methods, especially if they are some sort of API. Take IsCudaOutOfMemory as an example. Given the name I'd assume that one can query if the current device is out of memory. But is looks like this is intended to check whether a given tensor with a given shape would fit.

Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?

Currently the out of memory tests will also not run on mac due to the model having pytorch state-dict weights (there is a bug currently, that prevents this from working). An easy workaround would be switching to torchscript. I think in current bioimage.spec/core this bug was fixed, but we're not there yet :). (Maybe we should add CI on a different platform, too... not in this PR ;))

I also noticed that there aren't any tests for InputTensorValidator - which would be great to have for completeness.

k-dominik · 2024-08-20T08:32:52Z

tiktorch/server/grpc/inference_servicer.py

+ return client.api.forward(sample)
+
+ def _check_gpu_exists(self, client: BioModelClient, device_id: str):
+ gpu_device_ids = [device.id for device in self.__device_pool.list_devices() if device.id.startswith("cuda")]


devices could also be "mps" (on apple silicon).

k-dominik · 2024-08-20T08:34:52Z

tiktorch/server/grpc/inference_servicer.py

+ if len(gpu_device_ids) == 0:
+ raise ValueError("Not available gpus found")
+ if device_id not in client.devices:
+ raise ValueError(f"{device_id} not found for model {client.name}")


I guess one additional check would be if device_id in gpu_device_ids?

k-dominik · 2024-08-20T09:34:59Z

tiktorch/converters.py

+ max_shape_arr = np.array(list(max_shape.values()))
+ min_shape_arr = np.array(list(param_shape.min_shape.values()))
+ step_arr = np.array(list(param_shape.step_shape.values()))


I'm always afraid to rely two dicts having the same order... The prior check_same_axes is invariant to order of the keys. I'd probably do something like this with a fixed reference for the order:

Suggested change

max_shape_arr = np.array(list(max_shape.values()))

min_shape_arr = np.array(list(param_shape.min_shape.values()))

step_arr = np.array(list(param_shape.step_shape.values()))

max_shape_arr = np.array([max_shape[k] for k in max_shape])

min_shape_arr = np.array([param_shape.min_shape[k] for k in max_shape])

step_arr = np.array([param_shape.step_shape[k] for k in max_shape])

but maybe that's overdoing it - spec probably guarantees the order?

k-dominik · 2024-08-20T09:44:02Z

tiktorch/server/grpc/inference_servicer.py

+ return max_shape
+ return None
+
+ def _is_cuda_out_of_memory(self, client: BioModelClient, tensor_id: str, shape: NamedShape) -> bool:


This method seems to assume, that there will be only a single Tensor in the sample... While ilastik can only deal with one, tiktorch probably could deal with multiple inputs.

k-dominik · 2024-08-20T10:54:09Z

tests/test_server/test_grpc/test_inference_servicer.py

+ MAX_SHAPE = (1, 1, 10, 10)
+ AXES = ("b", "c", "y", "x")


maybe note that these values relate to the model being used - otherwise it's a bit hard to follow how MAX_SHAPE is enforced (i was first looking for some monkeypatching somewhere).

thodkatz · 2024-08-20T11:05:14Z

Great! Thank you very much @k-dominik for the review once more :)

Totally agree with the statements, I will attempt to resolve them. Could you please also have a look on this comment #213 (comment), I think that a little bit of refactoring will remove some redundant checks, and it will be more readable as well :)

thodkatz · 2024-08-20T11:26:16Z

Currently the out of memory tests will also not run on mac due to the model having pytorch state-dict weights (there is a ilastik/ilastik#2827 currently, that prevents this from working). An easy workaround would be switching to torchscript. I think in current bioimage.spec/core this bug was fixed, but we're not there yet :). (Maybe we should add CI on a different platform, too... not in this PR ;))

Regarding this, actually without knowing it, I have attempted to fix the weight conversion functionality of the bioimage core in this PR! But I think that this should indeed be handled by the core, and tiktorch shouldn't know about the weights format. But yeah let's keep the CI on another PR :)

Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?

Yep you are right! I think it is time for the concept of Sample even from the proto buff perspective, so everything will be consistent with spec too :) I would assume that the client should provide a Sample to check the out of memory functionality, so we don't really have to guess what we should do with other tensors if not specified. Although the Sample increases the complexity a lot, because if you have for example two inputs with parameterized shape, and I want to test a range of shapes, then should I create a grid of all the possible combinations, should the shapes be updated synchronously?

k-dominik · 2024-08-20T11:31:22Z

Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?

Yep you are right! I think it is time for the concept of Sample even from the proto buff perspective, so everything will be consistent with spec too :) I would assume that the client should provide a Sample to check the out of memory functionality, so we don't really have to guess what we should do with other tensors if not specified. Although the Sample increases the complexity a lot, because if you have for example two inputs with parameterized shape, and I want to test a range of shapes, then should I create a grid of all the possible combinations, should the shapes be updated synchronously?

It's a bit of a pity to reason about this still with the old spec in mind. In the new one as I understood Fynn one would go by reference axes... And I suppose yes, a (hyper-)grid would be the correct solution. But for some reason that sounds too complicated as a result. Poof, this needs more thinking...

k-dominik reviewed Aug 2, 2024

View reviewed changes

environment.yml Show resolved Hide resolved

proto/inference.proto Outdated Show resolved Hide resolved

tiktorch/server/grpc/inference_servicer.py Show resolved Hide resolved

tiktorch/server/grpc/inference_servicer.py Outdated Show resolved Hide resolved

thodkatz commented Aug 6, 2024

View reviewed changes

k-dominik reviewed Aug 6, 2024

View reviewed changes

thodkatz marked this pull request as draft August 7, 2024 12:52

thodkatz mentioned this pull request Aug 10, 2024

Remove interface model info #215

Merged

thodkatz added 5 commits August 19, 2024 10:38

Remove unused proto buff of InputShape and OutputShape

2a7eb7a

Since we removed ModelInfo interface, the proto buff for InputShape and OutputShape, and their conversions are redundant

Move InputTensorValidator to converters module

529243e

Add test data for emulating gpu out of memory

6f441ea

Add procedures for checking gpu out of memory for given shapes

b1595d7

Two procedures have been added: - Get the maximum tensor shape - Check if a tensor's shape fits to memory

Add device id to cuda requests

46ac782

The current interface supports multiple device ids. To check if a cuda memory request is a valid one, meaning that a gpu is detected, a device id is needed to do the check for the available ones if any.

thodkatz force-pushed the check-cuda-out-of-memory branch from 0c1cf06 to 46ac782 Compare August 19, 2024 12:52

thodkatz marked this pull request as ready for review August 19, 2024 13:07

thodkatz commented Aug 20, 2024

View reviewed changes

k-dominik reviewed Aug 20, 2024

View reviewed changes

thodkatz mentioned this pull request Aug 26, 2024

Update submodules to the latest changes #212

Merged

thodkatz marked this pull request as draft October 11, 2024 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check CUDA out of memory #213

Check CUDA out of memory #213

thodkatz commented Aug 2, 2024 •

edited

Loading

k-dominik left a comment

thodkatz commented Aug 6, 2024 •

edited

Loading

thodkatz Aug 6, 2024

k-dominik left a comment

thodkatz commented Aug 19, 2024 •

edited

Loading

thodkatz Aug 20, 2024

thodkatz Aug 20, 2024

thodkatz Aug 20, 2024 •

edited

Loading

thodkatz commented Aug 20, 2024 •

edited

Loading

k-dominik left a comment

k-dominik Aug 20, 2024

k-dominik Aug 20, 2024

k-dominik Aug 20, 2024

k-dominik Aug 20, 2024

k-dominik Aug 20, 2024

thodkatz commented Aug 20, 2024

thodkatz commented Aug 20, 2024 •

edited

Loading

k-dominik commented Aug 20, 2024

Check CUDA out of memory #213

Are you sure you want to change the base?

Check CUDA out of memory #213

Conversation

thodkatz commented Aug 2, 2024 • edited Loading

k-dominik left a comment

Choose a reason for hiding this comment

thodkatz commented Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

k-dominik left a comment

Choose a reason for hiding this comment

thodkatz commented Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thodkatz Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

thodkatz commented Aug 20, 2024 • edited Loading

k-dominik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thodkatz commented Aug 20, 2024

thodkatz commented Aug 20, 2024 • edited Loading

k-dominik commented Aug 20, 2024

thodkatz commented Aug 2, 2024 •

edited

Loading

thodkatz commented Aug 6, 2024 •

edited

Loading

thodkatz commented Aug 19, 2024 •

edited

Loading

thodkatz Aug 20, 2024 •

edited

Loading

thodkatz commented Aug 20, 2024 •

edited

Loading

thodkatz commented Aug 20, 2024 •

edited

Loading